PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese Text Recognition
نویسندگان
چکیده
Handwritten Chinese text recognition (HCTR) has been an active research topic for decades. However, most previous studies solely focus on the of cropped line images, ignoring error caused by detection in real-world applications. Although some approaches aimed at page-level have proposed recent years, they either are limited to simple layouts or require very detailed annotations including expensive line-level and even character-level bounding boxes. To this end, we propose PageNet end-to-end weakly supervised HCTR. detects recognizes characters predicts reading order between them, which is more robust flexible when dealing with complex multi-directional curved lines. Utilizing learning framework, requires only transcripts be annotated real data; however, it can still output results both character levels, avoiding labor cost labeling boxes Extensive experiments conducted five datasets demonstrate superiority over existing fully methods. These experimental may spark further beyond realms methods based connectionist temporal classification attention. The source code available https://github.com/shannanyinxiang/PageNet .
منابع مشابه
SEE: Towards Semi-Supervised End-to-End Scene Text Recognition
Detecting and recognizing text in natural scene images is a challenging, yet not completely solved task. In recent years several new systems that try to solve at least one of the two sub-tasks (text detection and text recognition) have been proposed. In this paper we present SEE, a step towards semi-supervised neural networks for scene text detection and recognition, that can be optimized end-t...
متن کاملEnd-to-end weakly-supervised semantic alignment
We tackle the task of semantic alignment where the goal is to compute dense semantic correspondence aligning two images depicting objects of the same category. This is a challenging task due to large intra-class variation, changes in viewpoint and background clutter. We present the following three principal contributions. First, we develop a convolutional neural network architecture for semanti...
متن کاملJoint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model
When extracting information from handwritten documents, text transcription and named entity recognition are usually faced as separate subsequent tasks. This has the disadvantage that errors in the first module affect heavily the performance of the second module. In this work we propose to do both tasks jointly, using a single neural network with a common architecture used for plain text recogni...
متن کاملTowards End-to-End Speech Recognition
Standard automatic speech recognition (ASR) systems follow a divide and conquer approach to convert speech into text. Alternately, the end goal is achieved by a combination of sub-tasks, namely, feature extraction, acoustic modeling and sequence decoding, which are optimized in an independent manner. More recently, in the machine learning community deep learning approaches have emerged which al...
متن کاملEnd-to-End Text Recognition with Hybrid HMM Maxout Models
The problem of detecting and recognizing text in natural scenes has proved to be more challenging than its counterpart in documents, with most of the previous work focusing on a single part of the problem. In this work, we propose new solutions to the character and word recognition problems and then show how to combine these solutions in an end-to-end text-recognition system. We do so by levera...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Computer Vision
سال: 2022
ISSN: ['0920-5691', '1573-1405']
DOI: https://doi.org/10.1007/s11263-022-01654-0